Goto

Collaborating Authors

 Stark County


ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects

Lee, Woojin, Chang, Hyugjae, Moon, Jaeho, Lee, Jaehyup, Kim, Munchurl

arXiv.org Artificial Intelligence

Weakly supervised oriented object detection (WS-OOD) has gained attention as a cost-effective alternative to fully supervised methods, providing both efficiency and high accuracy. Among weakly supervised approaches, horizontal bounding box (HBox)-supervised OOD stands out for its ability to directly leverage existing HBox annotations while achieving the highest accuracy under weak supervision settings. This paper introduces adaptive bounding box scaling and symmetry-prior-based orientation prediction, called ABBSPO, a framework for WS-OOD. Our ABBSPO addresses limitations of previous HBox-supervised OOD methods, which compare ground truth (GT) HBoxes directly with the minimum circumscribed rectangles of predicted RBoxes, often leading to inaccurate scale estimation. To overcome this, we propose: (i) Adaptive Bounding Box Scaling (ABBS), which appropriately scales GT HBoxes to optimize for the size of each predicted RBox, ensuring more accurate scale prediction; and (ii) a Symmetric Prior Angle (SPA) loss that exploits inherent symmetry of aerial objects for self-supervised learning, resolving issues in previous methods where learning collapses when predictions for all three augmented views (original, rotated, and flipped) are consistently incorrect. Extensive experimental results demonstrate that ABBSPO achieves state-of-the-art performance, outperforming existing methods.


Supplementary of VRS Bench: A Versatile Benchmark for Vision Language Understanding of Remote Sensing Images

Neural Information Processing Systems

VRSBench consists of 29,614 remote sensing images with detailed captions, 52,472 object refers, 123,221 visual question-answer pairs. This section documents the dataset in accordance with best practices to ensure transparency, reproducibility, and ethical usage. Images_val.zip contains all raw images in the validation split. Model Evaluation: The dataset can serve as a benchmark for comparing different vision-language models' performance on a standardized set of tasks. These annotations undergo a manual review by human annotators.



Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Shandilya, Utkarsh, Kappan, Marsha Mariya, Jain, Sanyam, Sharma, Vijeta

arXiv.org Artificial Intelligence

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.


Systematic Literature Review of Vision-Based Approaches to Outdoor Livestock Monitoring with Lessons from Wildlife Studies

Scott, Stacey D., Abbas, Zayn J., Ellid, Feerass, Dykhne, Eli-Henry, Islam, Muhammad Muhaiminul, Ayad, Weam, Kacmorova, Kristina, Tulpan, Dan, Gong, Minglun

arXiv.org Artificial Intelligence

Precision livestock farming (PLF) aims to improve the health and welfare of livestock animals and farming outcomes through the use of advanced technologies. Computer vision, combined with recent advances in machine learning and deep learning artificial intelligence approaches, offers a possible solution to the PLF ideal of 24/7 livestock monitoring that helps facilitate early detection of animal health and welfare issues. However, a significant number of livestock species are raised in large outdoor habitats that pose technological challenges for computer vision approaches. This review provides a comprehensive overview of computer vision methods and open challenges in outdoor animal monitoring. We include research from both the livestock and wildlife fields in the review because of the similarities in appearance, behaviour, and habitat for many livestock and wildlife. We focus on large terrestrial mammals, such as cattle, horses, deer, goats, sheep, koalas, giraffes, and elephants. We use an image processing pipeline to frame our discussion and highlight the current capabilities and open technical challenges at each stage of the pipeline. The review found a clear trend towards the use of deep learning approaches for animal detection, counting, and multi-species classification. We discuss in detail the applicability of current vision-based methods to PLF contexts and promising directions for future research.


Public Computer Vision Datasets for Precision Livestock Farming: A Systematic Survey

Bhujel, Anil, Wang, Yibin, Lu, Yuzhen, Morris, Daniel, Dangol, Mukesh

arXiv.org Artificial Intelligence

Technology-driven precision livestock farming (PLF) empowers practitioners to monitor and analyze animal growth and health conditions for improved productivity and welfare. Computer vision (CV) is indispensable in PLF by using cameras and computer algorithms to supplement or supersede manual efforts for livestock data acquisition. Data availability is crucial for developing innovative monitoring and analysis systems through artificial intelligence-based techniques. However, data curation processes are tedious, time-consuming, and resource intensive. This study presents the first systematic survey of publicly available livestock CV datasets (https://github.com/Anil-Bhujel/Public-Computer-Vision-Dataset-A-Systematic-Survey). Among 58 public datasets identified and analyzed, encompassing different species of livestock, almost half of them are for cattle, followed by swine, poultry, and other animals. Individual animal detection and color imaging are the dominant application and imaging modality for livestock. The characteristics and baseline applications of the datasets are discussed, emphasizing the implications for animal welfare advocates. Challenges and opportunities are also discussed to inspire further efforts in developing livestock CV datasets. This study highlights that the limited quantity of high-quality annotated datasets collected from diverse environments, animals, and applications, the absence of contextual metadata, are a real bottleneck in PLF.


Active Retrieval Augmented Generation

Jiang, Zhengbao, Xu, Frank F., Gao, Luyu, Sun, Zhiqing, Liu, Qian, Dwivedi-Yu, Jane, Yang, Yiming, Callan, Jamie, Neubig, Graham

arXiv.org Artificial Intelligence

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout generation is essential. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.


De-confounding Representation Learning for Counterfactual Inference on Continuous Treatment via Generative Adversarial Network

Zhao, Yonghe, Huang, Qiang, Zeng, Haolong, Pen, Yun, Sun, Huiyan

arXiv.org Artificial Intelligence

Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.


Employing Drones in Agriculture: An Exploration of Various Drone Types and Key Advantages

Nunes, E. C.

arXiv.org Artificial Intelligence

This article explores the use of drones in agriculture and discusses the various types of drones employed for different agricultural applications. Drones, also known as unmanned aerial vehicles (UAVs), offer numerous advantages in farming practices. They provide real-time and high-resolution data collection, enabling farmers to make informed irrigation, fertilization, and pest management decisions. Drones assist in precision spraying and application of agricultural inputs, minimizing chemical wastage and optimizing resource utilization. They offer accessibility to inaccessible areas, reduce manual labor, and provide cost savings and increased operational efficiency. Drones also play a crucial role in mapping and surveying agricultural fields, aiding crop planning and resource allocation. However, challenges such as regulations and limited flight time need to be addressed. The advantages of using drones in agriculture include precision agriculture, cost and time savings, improved data collection and analysis, enhanced crop management, accessibility and flexibility, environmental sustainability, and increased safety for farmers. Overall, drones have the potential to revolutionize farming practices, leading to increased efficiency, productivity, and sustainability in agriculture.


Stadiums Have Gotten Downright Dystopian

The Atlantic - Technology

Like so many cities before it, Phoenix went all out to host the Super Bowl earlier this month. Expecting about 1 million fans to come to town for the biggest American sporting event of the year, the city rolled out a fleet of self-driving electric vehicles to ferry visitors from the airport. Robots sifted through the trash to pull out anything that could be composted. There were less visible developments, too. In preparation for the game, the local authorities upgraded a network of cameras around the city's downtown--and have kept them running after the spectators have left.